2017-05-17

Introduction

Outline of Workshop

Our Textbook

ModernDive and Our Syllabi

  • ModernDive's guiding philosophies are deeply intertwined our syllabi
  • Hard to speak of ModernDive in isolation from syllabi and vice-versa
  • Numbers are numbers, but data has context…

Context & Background

Albert's Course

  • Introduction to Statistical & Data Sciences: Webpage and GitHub Repo
  • Administrative:
    • Chief non-econ/bio stats service class at Middlebury
    • 12 weeks each with 3h "lecture" + 1h "lab"
  • Students:
    • ~24 students/section of all years/backgrounds. Only stats class many will take
    • Background: Many had AP stats, some with programming
    • All had laptops that they brought everyday

Albert's Syllabus

  • Topic List
    • First half is data science: data visualization, manipulation, importing
    • Second half is intro stats: sampling, hypothesis tests, CI, regression
  • Evaluation
    • 10%: weekly problem sets
    • 10%: engagement
    • 45%: 3 midterms (last during finals week)
    • 35%: Final projects

Albert's Typical Classtime

  • First 10-15min: Priming topic, either via slides or chalk talk
  • Remainder: Students read over text & do Learning Checks in groups and without direct instructor guidance.

Chester: Social Statistics

What is Different?

What are we doing that's different and why?

  1. Data first! Start with data science via tidyverse, then stats.
  2. Replacing the mathematical/analytic with computational/simulation-based whenever possible.
  3. The above necessitates algorithmic thinking, computational logic and some coding/programming.
  4. Complete reproducibility

1) Data First!

Actual dialogue I had with a student:

1) Data First!

Cobb (TAS 2015): Minimizing prerequisites to research. In other words, focus on entirety of Wickham/Grolemund's pipeline…

1) Data First!

… and not just this part.

1) Data First!

Furthermore use data science tools that a data scientist would use. Example: tidyverse

1) Data First!

What does this buy us?

  • Context for asking scientific questions
  • Look at data that's rich, real, and realistic. Examples: Data packages such as nycflights13 and fivethirtyeight
  • Better motivate traditional statistical topics

2) Computers, Not Math!

Cobb (TAS 2015): Two possible "computational engines" for statistics, in particular relating to sampling:

  • Mathematics: formulas, probability theory, large-sample approximations, central limit theorem
  • Computers: simulations, resampling methods

2) Computers, Not Math!

We present students with a choice for our "engine":

Either we use this… Or we use this…
Drawing Drawing


  • Almost all are thrilled to do latter
  • Leave "bread crumbs" for more advanced math/stats courses

2) Computers, Not Math!

What does this buy us?

  • Emphasizes: stats is not math, rather stats uses math.
  • Simulations are more tactile
  • Reducing probability and march to CLT, this frees up space in syllabus.

3) Algorithms, Computation, & Coding

  • Both "Data First!" and "Computers, Not Math!" necessitate algorithmic thinking, computational logic and some coding/programming.
  • Battle is more psychological than anything:
    • "This is not a class on programming!"
    • "Computers are stupid!"
    • "Learning to code is like learning a foreign language!"
    • "Early on don't code from scratch! Take something else that's similar and tweak it!"
    • Learning how to Google effectively

3) Algorithms, Computation, & Coding

Why should we do this?

  • Data science and machine learning.
  • Where statistics is heading. Gelman blog post.
  • Many new tools, like DataCamp, allow us to outsource many less interesting topics to teach.
  • Bigger picture:
    • Coding is becoming a basic skill like reading and writing.
    • Data analysis, as opposed to algorithms and data structures, could attract more students from traditionally underrepresented groups.

4) Complete Reproducibility

  • Students learn best when they can take can apart a toy (analysis) and then rebuild it (synthesis).
  • Crisis in Reproducibility
  • Ultimately the best textbook is one you've written yourself.
    • Everyone has different contexts, backgrounds, needs
    • Hard to find one-size-fits-all solutions
  • A new paradigm in textbooks? Versions, not editions?

Let's Dive In!

Insert appropriate image

Getting Started

R

  • Chester's Book: Ask how he wants it pitched
  • DataCamp: Intro to R

RStudio vs RStudio Server